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Abstract. Real world events are driven by a mixture of both centralized and distributed control of 
individual agents based on their situational context and internal make up. For example, some people 
have partial allegiances to multiple, contradictory authorities, as well as to their own goals and principles. 
This can create a cognitive dissonance that can be exploited by an appropriately directed psychological 
influence operation (PSYOP). An Autonomous Dynamic Planning and Execution (ADP&E) approach is 
proposed for modeling both the unperturbed context as well as its reaction to various PSYOP 
interventions. As an illustrative example, the unrest surrounding the Iranian elections in the summer of 
2009 is described in terms applicable to an ADP&E modeling approach. Aspects of the ADP&E modeling 
process are discussed to illustrate its application and advantages for this example. 


Introduction 

We propose using an Autonomous Dynamic 
Planning and Execution (ADP&E) approach that 
integrates both a centralized and distributed 
planning control capability to more realistically 
model complex social group interactions. In our 
recent survey of implemented models within 
social science, they do not successfully model 
future influence operations because they do not 
integrate enough cognitive realism in each 
automated-human (agent) to represent real world 
conditions and events. This makes the current 
models unsuitable for large-scale, complex 
problem domains. More specifically, 
implemented models fail to capture several 
aspects of human behavior because these 
models do not include the ability to adjust to very 
large, partially observable, and uncertain 
environments, nor use human abilities in 
dynamic planning to maintain agility in these 
ever-changing environments. 

In addition, many techniques assume a 
completely distributed (decentralized) approach 
that uses simplified cognitive agents with common 
goals to create swarm-like behavior [1]. This leads 
to emergent events when the cumulative cognitive 
state reaches a tipping point. In the same context, 
other techniques rely on completely centralized 
control of agents to optimize their coordination and 
lead to more optimal strategies of cooperative 
event behavior, which can suspend reactions of 
discontent and generate strong unified positions [2]. 
Both of these approaches are goal-directed, but the 
centralized approach relies more on reputational or 
social utility, while the distributed approach relies 
more on intrinsic or expressive (i.e., individual or 
psychological) utility. 

Real world events are actually driven by a mixture 
of both centralized and distributed control of 
individuals (agents) based on their situational 


context and internal makeup. Given the level and 
type of education, age, interests, experiences, 
religious affiliation, economic status, etc., 

individuals have varying degrees of both 
centralized and distributed behavioral influences 
that either enhances or detracts from their current 
environmental status or cross-cuts their current 
environmental circumstances. For example, some 
people may have partial allegiances to multiple 
contradictory authorities (e.g., religious vs. science, 
dictator vs. democracy, etc.), which could create a 
cognitive dissonance within these people. 

This further could create an opportunity for 
change, given their uncertainty in their future, and 
their willingness to seek change from their current 
conditions. Does this form an opportunity for 
external forces to intervene and pursue a 
psychological influence operation (PSYOP) to 
redirect the event toward a change beneficial to its 
interests, or does meddling at such a time backfire 
and strengthen the opposition’s claims and 
perhaps tip the balance in our adversaries favor? 
An autonomous dynamic planning and execution 
(ADP&E) framework has been built that includes 
variability in searching, selecting, and rewarding 
plans based on both individual and group 
behavior. Difficult questions such as this PSYOP 
mentioned above can be addressed in modeling 
and simulation if centralized and distributed 
planning are successfully integrated within the 
model via this ADP&E framework. They will thus 
better model the balance of using both centralized 
and distributed planning-influence control and 
further understand its sensitivity through 
simulating interactions among similar and differing 
social groups with differing parameter sets. 

Background 

Currently implemented cognitive approaches 
can be analyzed from a game theory perspective 
to determine their problem domain footprint. On 
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the one hand, reactive planning algorithms, such 
as temporal difference reinforcement learning 
can leam two-player stochastic games, such as 
Backgammon [3], On the other hand, deep 
search algorithms, such as decision-tree search 
using alpha-beta pruning can plan many moves 
ahead for a two-player deterministic game, such 
as chess [4], However, note that these games 
are both two-player and fully observable, while 
the real-world is many players and partially 
observable. Further, hybrid solutions have been 
proposed to handle more complex real world and 
game problems [5]. We propose using a more 
powerful hybrid approach that integrates more 
realistic features of social interaction by 
extending an ADP&E approach with both a 
centralized and distributed planning capability. 

An illustrative example will be investigated to 
better model and predict cumulative behavior 
amongst more cognitively realistic agents based 
on their interaction. The analyzed example will 
be akin to the situation in regards to the 2009 
Iranian elections, where there was a ruling 
faction and a dissenting faction in conflict. The 
ruling faction has some centralized authority for 
control of individuals and the dissenting faction 
also has some centralized authority for control of 
individuals. In addition, the individuals have 
some intrinsic freedom to choose the centralized 
control or act more independently among 
themselves. There are pressures from both sides 
(rulers or dissenters) and in both directions 
(centralized and distributed). 

We can enhance a current city simulation with 
some new features to better realize the behavior 
portrayed by the media. A small city has already 
been implemented for game playing multi-agent 
scenarios that includes movement models and 
line-of-sight. Agents can move based on 
prescribed waypoints and connections and 
observe based on proximity and line-of-sight. 
Communication connectivity can be added to the 
model for simulating the short-range (e.g., 
talking, signaling), mid-range (e.g., megaphone, 
video recording) and long-range (e.g., internet, 
cell phone) communication channels. The ruling 
authority can cut some communication as they 
did in Iran, but the dissenting faction can adapt 
their behavior by using alternative forms of 
communication. Also, peaceful and violent 
behavior can be exhibited from both sides, and 
scaling of confrontations can be investigated. 
However, individuals and group behaviors and 
communications will be limited to both simplify 
and exemplify the approach. 

A design and implementation strategy has been 
studied on the election defiance scenario in Iran. 
This paper describes an approach to 


implementing such a simulation and describes 
the benefits of such a system. 

Approach 

We describe here a five step approach to 
designing, implementing, and demonstrating a 
social science simulation to study the causal 
precursors that drive the effects in the current 
situation in Tehran, where protests continue 
sporadically against the conservative regime. 

1 . A baseline is necessary to allow interaction 
among actors. This has been accomplished 
using technologies that form urban 
environments into game models [6], Figure 1 
provides a simple viewpoint of a small city 
model with a variety of connected waypoints 
(not illustrated). 

2. The players of the simulation or game need to 
be identified. In the case of the Iranian 
situation, eight player types are identified and 
described. 

3. Each player must have enough planning ability 
to interact with the other players in a similar 
environment and illustrate realism in thought 
processes and ability to reassess and change 
strategies. This can be accomplished by 
integrating intrinsic-, extrinsic-, and 
expressive-utility in each player, and this is 
described from each player’s point of view. 
These utilities are implemented via a value 
function that is an integral part of the ADP&E 
system. 

4. The interactions must be identified according 
to the current power structure and number of 
agents under each authoritarian player. The 
interactions are identified in Figure 2 and each 
interactive link will be described in detail. 

5. Each player is identifiable as a planner in an 
ADP&E system, where their plans and 
perceptions impact all players involved 
simultaneously, and where higher order affects 
are plausible and likely. In other words, within 
each planner, their parameters dictate their 
behavior and interaction in an attempt to 
maximize their own utility, while readjusting 
their plans to counter other planners’ activities. 
Once implemented, parameters can be tuned to 
illustrate social behavior on a more complex 
scale. 

Step 1 : Urban Environmental Game Models 

In previous work, an automated technique has 
been developed to: generate an urban terrain 
movement model for computer gaming from a 
Compact Terrain DataBase (CTDB), increase the 
simulation speed of operations to allow much 
faster than real time operations, and a 
programming interface for planning algorithms 
has been defined to integrate multiple planners 
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into the model. An example city model is shown 
in Figure 1. 



Figure 1 . Example City Game Board 


To better understand the order of magnitude of 
this city model, Figure 1 shows a top-down 
picture of the terrain model used. The model is a 
small city of approximately 4 km x 5 km. More 
specifically, there are 3649 buildings with over 
12,000 floor locations. There were over 31,000 
waypoints generated for this terrain model. 

Step 2: Major Game Players 

There are five major players in the election 
situation in Iran, where the people are protesting 
against the election results, which appear to be 
drastically different than prior polls indicate. The 
five major players in this conflict are: the 
supreme leader Ayatollah Ali Khamenei who 
backs the government declared incumbent 
President Mahmoud Ahmadinejad, the leading 
challenger Mir Hossein Mousavi, the general in 
charge of Iran's Revolutionary Guard 
Mohammad Ali Jafari, the religious hierarchy, 
and the people. 

The supreme leader is a 70-year-old cleric. He 
reigns over Iran's Islamic system as part pope, 
part commander in chief and as a one-man 
supreme court. President Mahmoud Ahmadinejad 
was the winner of the June 12, 2009 election. He 
is an ultra-conservative who has isolated Iran 
from the rest of the world through condemnations 
of the United States, Israel, and United Nations. 
The president is backed by the supreme leader 
and is a puppet, so he is not considered a player 
here. Mohammad Ali Jafari oversees the 125,000 
members of Iran’s military. This revolutionary 
guard (RG) takes direct orders and is considered 
the strong arm of the supreme leader. The 
religious hierarchy is under direction of the 
supreme leader as well, but some clerics are 
asking for reform and a recount of the election. 
Thus, we have broken this group into two groups, 
a clerical reform player and a clerical 
conservative player. The people are by far the 


largest player in this conflict. This group can be 
divided into three camps: the conservatives that 
side with the incumbent, the reformists that side 
with the reform party, and the people that want to 
remain neutral. 

As an assumption, some players are considered 
as single agent planners, such as the supreme 
leader, the reform leader, and the religious 
clerics. The remaining two planners are the 
revolutionary guard and the people. These 
planners require many agents in order to show 
the escalation of the conflict. The proper ratio is 
not known but there are over 7 million people 
living in Tehran and only 125 thousand guards in 
the entire country. However, the guards are well 
trained and armed. There are more players in the 
Iranian election situation than the ones described 
here, but these eight should be enough to 
sufficiently simulate the conflict. 


Players\Metrics 

Intrinsic Utility 

Expressive 

Utility 

Reputation 

Utility 

Supreme 

Leader 

Suppress 

Protests 

Zero Tolerance/ 
Block Some 
Media 

Treated As 
God/ Can Do 
Little Wrong 

Reform Party 

Ignite Protests/ 
Avoid Violence 

Keep Reform 
MovementAlive 

Adjust to 
People’s Needs 

Revolutionary 

Guard 

Take Orders 

Use Force 

Never Show 
Fear 

Religious 

Hierarchy 

Conservatives 

Make People 
Subservient 

Teach Religious 
Obedience 

Back Religious 
Beliefs 

Religious 

Hierarchy 

Reformists 

Gain Power 

Demand 
Recount/ Reject 
Violence 

Empathize/Gain 
People's Favor 

People 

Conservatives 

Follow Religion 
Verbatim 

Demand Others 
to Follow 

Hard Working/ 
Poorer Class 

People Neutral 

Follow leader 
and keep low 
profile 

Avoid areas of 
conflict/ Be Safe 

Maintain 
Respect/ Peace 

People 

Reformists 

Believe Reform 
Will Help 
Economy 

Instigate 
Protests/ 
Free Speech 

Defend Women/ 

Debate/ 

Dialogue 


Table 1 . Players and Their Utility Metrics 


Step 3: Utility 

To appreciate the escalation of the conflict in Iran 
three measures of utility can be used for each 
player: intrinsic, expressive, and reputational 
utility. Intrinsic utility is the measure of what that 
player thinks is important and wants to 
accomplish. Expressive utility is the measure of 
how a player will deliver their message. 
Reputational utility is how the player perceives 
other players’ opinion of their actions. 

These players’ metrics are shown in Table 1. 
This table is a qualitative description of the utility 
metrics. In an implementation, these metrics 
must be translated into some quantitative form 
that is reflected in their agents’ actuators and 
sensors. For instance, the revolutionary guard’s 
reputational utility is not to show fear, so they will 
never retreat when confronted to maintain fear in 
the people. 
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Step 4: Interactions 

Player interactions are too many to build a real 
model of the Iranian election conflict. However, a 
simplified interactive model can be created if 
assumptions are made. Figure 2 shows such a 


simplified representation. The interactions are 
labeled one to thirteen with interactions six and 
seven expanded for the multiple religious 
hierarchy players and people players, 
respectively. 



Connection 1 in Figure 2 is the supreme 
commander contemplating plans to suppress the 
protests, his intrinsic utility goal. Connection 2 is 
the supreme leader giving direction to the religious 
hierarchy, especially Ayatollah Ahmad Jannati 
Massah who heads Iran’s 12-member Guardian 
Council, which certifies election results and is 
closely allied with Khamenei. Connection 3 is the 
limitations imposed on the reform party by the 
supreme leader. Many times these directions are 
ignored, such as not attending a religious rally to 
honor the dead. Connection 4 is the interaction 
between the people and the supreme leader. The 
supreme leader demands no protests and many 
people defy him by attending rallies. Connection 5 
is the supreme leader’s use of the revolutionary 
guard (RG) to forcibly take to the streets and 
break up protests. Also, the RG acts as an agent, 
which attempts to cut communication by 
confiscating cell phones and detaining people. 
Connections 6a-c are the religious hierarchy 
contemplating plans to either gain power 
(reformist group) or maintain allegiance to the 
supreme leader (conservative group). 
Connections 7a-f are the interactions among the 
people. The conflict among the people escalated 
into violence in first few days of protests. 
Connection 8 is the reform party contemplating 
plans as things unfold. For instance, the reform 
party decided to have large events centered on 
honoring the dead, which appealed to many 
people and created large crowds. Connection 9 


was the interaction between the people and the 
reform party. They worked together to create large 
peaceful protests that further aggravated the 
supreme leader. Connection 10 is the mixed 
messages received from the clerics, some sided 
with the supreme leader while others demanded a 
vote recount or void election. Connection 11 
exemplifies the conflict between the protesters 
and the RG. Many people have been killed and 
arrested in this conflict and is triggered by their 
unwillingness to back down on both sides. 
Connection 12 represents the RG contemplating 
maneuvers to break up protests, raid reformists 
homes, confiscate communication devices, and 
detain uncooperative people. Finally, connection 
13 is the RG’s attempt to subdue the reform party, 
such as detaining them from going to rallies. 

Step 5: ADP&E System 

The proven approach used here has five tiers, 
from the inner cycle of dynamic planning, 
executing, and assessing plans for players and 
agents, through the highest level, adapting 
players’ strategies using tournament play 
through multiple games. Figure 3 illustrates this 
ADP&E implementation framework. 

This system concept was built from the ground 
up to be an efficient and modular approach. This 
approach has been already applied for two 
applications, the game RISK [7], and an urban 
search and rescue operation [8]. 
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■ First, the core cycle was developed as an 
action and response system, where individual 
action sequences are planned, executed, and 
assessed in various model environments, with 
varying projected expectations, over many 
cycles, and for all agents in the correct time 
sequence. 

■ At the second level, agents execute a 
particular plan, and each agent’s action set is 
stored separately for modularity. 

■ Third, the player is the conceiver and 
conductor of a plan that encompasses all 
agent activities. A player has a set of 
parameters that determine its choice of 
planned actions, and how often to re-plan 
those actions. 


■ Fourth, a game is the domain where action 
sequences are executed in the model 
environments, which will always lead to a final 
goal state. The final goal state must be 
achievable, because human intervention is 
prohibited in this framework and a game only 
completes when the final goal is achieved. 


■ Fifth, tournaments of games are arranged, so 
that players can improve their parameter 
settings over the course of many tournaments. 
Through evaluating each player’s progress, 
and modifying the best players’ parameters, 
players can improve their play. 


Tournaments and rounds 
Games or Simulations 

Player(s) or Planner(s) 

I ® 

Agents (type and 
number) (T) 

Cor_e_CyC-l- e (T) 

1) Planning 

2) Execution 

3) Assessment 
Back to (1) 

Figure 3. ADP&E Framework 



At the heart of this approach is a core planning 
cycle for each of the eight players of the game. 
Figure 3 shows an illustration of this cycle. The 
core cycle has three components: (1) plan- 
generator (PG); (2) plan-executor (PE); and (3) 
plan-assessor (PA). The plan-generator is 
considered the search engine for contemplating 
plans for each player. PG strings together 
individual actions to form plans for each agent 


based on current perception of situation. The 
utility metrics described above can be used to 
evaluate plans and choose the better ones. 
Formulations as to how to generate and choose 
plans have been examined on two very large 
planning problems and are described in two 
previous papers [7] [8]. The Plan-Executor 
executes the plans in time sequential order. The 
plan-assessor estimates how well the remaining 
plan will execute given new observed information 
acquired from the environment while executing 
the plan. This cycle can be run after each 
executed action. 



Figure 4. Planning Core Cycle 


The three components use three objects that are 
manipulated and shared among the components. 
These three objects are the (1) plans , (2) 
models , and (3) expectations. Plans are 
generated by PG, executed by PE and assessed 
by PA. All players can be run in separate threads 
and execute independently. City Models are 
used in PG to predict future states, are used in 
PE to observe the real states, and are used in 
PA to observe whether expectations will be met. 
The models used in PG and PA are virtual-state 
city models, which are approximate to the real- 
state model used in PE. The real-state model is a 
real-world model, where a plan is executed. 
Virtual-state models do not know the real states 
until observed and are initialized to reasonable 
expectations. Thus, there are nine perceptions of 
the city model based on which planner is under 
consideration. There is one virtual model for 
each planner and a real-world model where all 
planners can execute their actions. Expectations 
are the measure of how well a plan achieves a 
desired goal (utility metrics), such as breaking up 
a protest. Expectations are projected both by the 
generated plan in PG and by the plan used in 
PA. The two expectations are compared to see if 
the expectations projected in PA still meet or 
exceed the originally generated plan 
expectations projected in PG. Each agent has an 
expectation for its plan. If expectations are met to 
a prescribed degree, a plan is retained; 
otherwise a plan is reformulated in PG. 

If implemented, such a simulation tool can 
provide three major advantages. First, tuning 
parameters is crucial to matching historical 
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records. The versatility in choosing alternative 
actions under uncertainty (e.g., reformist people 
were younger and more educated, using high 
tech devices for communications, something the 
leaders did not consider in initial plans), the 
timing of actions/ responses (e.g., the 
government lost credibility when saying the 
election was true when they did not use any time 
to investigate), the amount of reassessment and 
replanning (e.g., people switched to alternative 
forms of communication when services were cut, 
such as twitter, and cell phones) of each the 
eight players is critical. These are just three 
instances where agile planning is used in real 
world social events, and there are many other 
areas to investigate. Thus, tuning planner 
parameters in key aspects is essential to 
matching real world scenarios. The tuning of 
parameters can be learned via developed 
techniques already established for two other 
applications [7] [8]. 

The second advantage is the use of an ADP&E 
system to predict how real-time events will 
unfold. When a model has been developed that 
accurately predicts the evolution of historical 
events for a culture as described above, it can be 
tuned to follow the course of current events and 
could predict their future development with less 
uncertainty. These predictions can be further fine 
tuned to account for shifting alliances and 
priorities. Once a baseline of activity has been 
established, the ability to identify underlying 
causes such as those that lead to unexpected 
results is valuable information in itself. 

The third advantage of such a simulation tool is 
to inject possible outside influences into the 
model and see if and how they alter the course 
of events. Models such as these could self train 
to produce the most desirable effects with the 
smallest perturbations. Further, trained models 
may be examined to determine that observations 
of the evolving environment are most useful to 
determine that plan expectations are being met. 

Summary 

This paper has proposed the application of 
ADP&E to modeling social influence in a 
combined centralized and distributed context. 
Individual agents have partial allegiances to one 
or more, potentially conflicting, central 
authorities, as well as their own internal goals 
and principles. Agents are not simply reactive, 
but proactively plan and execute action 
sequences in these contexts. ADP&E can 
provide a means of modeling the social forces at 
work within an individual agent, as well as the 
shifting allegiances and conflicts among agents. 
Into this complex, dynamic hierarchy, various 
PSYOP interventions can be injected, and the 


micro and macro reactions of the system 
observed. 

The unrest surrounding the Iranian elections in 
the summer of 2009 have been used as an 
illustrative example of ADP&E modeling. The 
defining elements of that situation have been 
deconstructed into items and relationships 
prerequisite for the formation of a model. 
Application of ADP&E to that model has served 
to explain the features of ADP&E, and describe 
its benefits for such social influence models. 
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